placement algorithm
Cluster Topology-Driven Placement of Experts Reduces Network Traffic in MoE Inference
Sivtsov, Danil, Katrutsa, Aleksandr, Oseledets, Ivan
Efficient deployment of a pre-trained LLM to a cluster with multiple servers is a critical step for providing fast responses to users' queries. The recent success of Mixture-of-Experts (MoE) LLMs raises the question of how to deploy them efficiently, considering their underlying structure. During the inference in MoE LLMs, only a small part of the experts is selected to process a given token. Moreover, in practice, the experts' load is highly imbalanced. For efficient deployment, one has to distribute the model across a large number of servers using a model placement algorithm. Thus, to improve cluster utilization, the model placement algorithm has to take into account the network topology. This work focuses on the efficient topology-aware placement of the pre-trained MoE LLMs in the inference stage. We propose an integer linear program (ILP) that determines the optimal placement of experts, minimizing the expected number of transmissions. Due to the internal structure, this optimization problem can be solved with a standard ILP solver. We demonstrate that ILP-based placement strategy yields lower network traffic than competitors for small-scale (DeepSeekMoE~16B) and large-scale (DeepSeek-R1~671B) models.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
- Asia > Russia (0.05)
- North America > United States (0.04)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)
Open3DBench: Open-Source Benchmark for 3D-IC Backend Implementation and PPA Evaluation
Shi, Yunqi, Gao, Chengrui, Ren, Wanqi, Xu, Siyuan, Xue, Ke, Yuan, Mingxuan, Qian, Chao, Zhou, Zhi-Hua
This work introduces Open3DBench, an open-source 3D-IC backend implementation benchmark built upon the OpenROAD-flow-scripts framework, enabling comprehensive evaluation of power, performance, area, and thermal metrics. Our proposed flow supports modular integration of 3D partitioning, placement, 3D routing, RC extraction, and thermal simulation, aligning with advanced 3D flows that rely on commercial tools and in-house scripts. We present two foundational 3D placement algorithms: Open3D-Tiling, which emphasizes regular macro placement, and Open3D-DMP, which enhances wirelength optimization through cross-die co-placement with analytical placer DREAMPlace. Experimental results show significant improvements in area (51.19%), wirelength (24.06%), timing (30.84%), and power (5.72%) compared to 2D flows. The results also highlight that better wirelength does not necessarily lead to PPA gain, emphasizing the need of developing PPA-driven methods. Open3DBench offers a standardized, reproducible platform for evaluating 3D EDA methods, effectively bridging the gap between open-source tools and commercial solutions in 3D-IC design.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (12 more...)
- Information Technology > Software (0.91)
- Information Technology > Artificial Intelligence > Machine Learning (0.46)
Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms
Wang, Zhihai, Geng, Zijie, Tu, Zhaojie, Wang, Jie, Qian, Yuxi, Xu, Zhexuan, Liu, Ziyan, Xu, Siyuan, Tang, Zhentao, Kai, Shixiong, Yuan, Mingxuan, Hao, Jianye, Li, Bin, Zhang, Yongdong, Wu, Feng
The increasing complexity of modern very-large-scale integration (VLSI) design highlights the significance of Electronic Design Automation (EDA) technologies. Chip placement is a critical step in the EDA workflow, which positions chip modules on the canvas with the goal of optimizing performance, power, and area (PPA) metrics of final chip designs. Recent advances have demonstrated the great potential of AI-based algorithms in enhancing chip placement. However, due to the lengthy workflow of chip design, the evaluations of these algorithms often focus on intermediate surrogate metrics, which are easy to compute but frequently reveal a substantial misalignment with the end-to-end performance (i.e., the final design PPA). To address this challenge, we introduce ChiPBench, which can effectively facilitate research in chip placement within the AI community. ChiPBench is a comprehensive benchmark specifically designed to evaluate the effectiveness of existing AI-based chip placement algorithms in improving final design PPA metrics. Specifically, we have gathered 20 circuits from various domains (e.g., CPU, GPU, and microcontrollers). These designs are compiled by executing the workflow from the verilog source code, which preserves necessary physical implementation kits, enabling evaluations for the placement algorithms on their impacts on the final design PPA. We executed six state-of-the-art AI-based chip placement algorithms on these designs and plugged the results of each single-point algorithm into the physical implementation workflow to obtain the final PPA results. Experimental results show that even if intermediate metric of a single-point algorithm is dominant, while the final PPA results are unsatisfactory. We believe that our benchmark will serve as an effective evaluation framework to bridge the gap between academia and industry.
- Asia > Middle East > Israel (0.04)
- Europe (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Workflow (1.00)
- Research Report > New Finding (0.34)
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Li, Zhuohan, Zheng, Lianmin, Zhong, Yinmin, Liu, Vincent, Sheng, Ying, Jin, Xin, Huang, Yanping, Chen, Zhifeng, Zhang, Hao, Gonzalez, Joseph E., Stoica, Ion
Model parallelism is conventionally viewed as a method to scale a single large deep learning model beyond the memory limits of a single device. In this paper, we demonstrate that model parallelism can be additionally used for the statistical multiplexing of multiple devices when serving multiple models, even when a single model can fit into a single device. Our work reveals a fundamental trade-off between the overhead introduced by model parallelism and the opportunity to exploit statistical multiplexing to reduce serving latency in the presence of bursty workloads. We explore the new trade-off space and present a novel serving system, AlpaServe, that determines an efficient strategy for placing and parallelizing collections of large deep learning models across a distributed cluster. Evaluation results on production workloads show that AlpaServe can process requests at up to 10x higher rates or 6x more burstiness while staying within latency constraints for more than 99% of requests.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York (0.04)
- (4 more...)
VLSI cell placement techniques
The VLSI cell placement problem is known to be NP-complete. This paper presents a survey of the various approaches and techniques for this problem. It also gives a comprehensive tutorial on the subject, providing an excellent introduction to the terminology and classification of placement algorithms. With the growing diversity of the terms appearing in the literature, I found the explicit warning about synonymous usage of words like module, cell, and element or net, wire, and interconnect to be helpful. The placement algorithms whose emphasis is on standard cell and macro placement fall into five groups, according to their underlying technique: (1) simulated annealing, (2) force-directed, (3) minimum-cut, (4) numerical optimization, and (5) evolution based. The origins of the first two are in physical laws.
- Overview (0.96)
- Instructional Material (0.63)